HIGH DIMENSIONAL MATRIX ESTIMATION WITH 
UNKNOWN VARIANCE OF THE NOISE 



OLGA KLOPP 



Abstract. Assume that we observe a small set of entries or lin- 
ear combinations of entries of an unknown matrix Aq corrupted 
by noise. We propose a new method for estimating Aq which does 
not rely on the knowledge or an estimation of the standard devi- 
ation of the noise a. Our estimator achieves, up to a logarithmic 
factor, optimal rates of convergence under the Frobenius risk and, 
thus, has the same prediction performance as previously proposed 
estimators which rely on the knowledge of a. 



1. Introduction 

The problem of the recovery of a data matrix from incomplete and 
corrupted information appears in a variety of applications such as rec- 
ommendation systems, system identification, global positioning, re- 
mote sensing (for more details see [3]). For instance, in the Netflix 
recommendation system, we observe a few movie ratings from a large 
data matrix in which rows are users and columns are movies. Each user 
only watches a few movies compared to the total database of movies 
available on Netflix. The goal is to predict the missing ratings in order 
to be able to recommend the movies to a person that he/she has not 
yet seen. 

In the noiseless setting, if the unknown matrix has low rank and is 
"incoherent" , then it can be reconstructed exactly with high probability 
from a small set of entries. This result was first proved by Candes and 
Recht [4] using nuclear norm minimization. A tighter analysis of the 
same convex relaxation was carried out in [5]. For a simpler approach 
see [16] and [8]. An alternative line of work was developed by Keshavan 
et al in [10]. More recent results of Gross [8] and Recht [16] provide 
sharper conditions. For example, Recht [16] showed that, if we observe 
n entries of a matrix Aq G ]R'"i^'"2 -^yith locations uniformly sampled 
at random, then under "incoherence conditions" the exact recovery is 
possible with high probability if n > Cr{mi + 777,2) log^m2 with some 
constant C > and r = rank(y4o) . 
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In a more realistic setting the observed entries are corrupted by noise. 
This question has been recently addressed by several authors (see, e.g., 
[3, 9, 17, 14, 15, 12, 13, 6, 11]). These methods rely on the knowledge 
or a pre-estimation of the standard deviation a of the noise. Estima- 
tion of cr is non-trivial in the large-scaled problems. The estimator 
that we propose in the present paper eliminates the need to know or to 
pre-estimate a. It is inspired (but leads to a different analysis) by the 
square-root lasso estimator proposed for the linear regression model by 
Belloni et al in [1]. We show that, up to a logarithmic factor, our esti- 
mator achieves optimal rates of convergence under the Frobenius risk. 
Thus, it has the same prediction performance as previously proposed 
estimators which rely on the knowledge of a. 

This paper is organized as follows. In Section 2 we set notations, 
introduce our model - the trace regression model and our estimator. 
In Section 3 (Theorem 2), we prove a general oracle inequality for the 
prediction error for the trace regression model. 

In the Section 4, we apply Theorem 2 to the case of matrix comple- 
tion under uniform sampling at random (USR). We propose a choice 
of the regularization parameter A for our estimator which is indepen- 
dent of a. The main result. Theorem 6, shows that in the case of USR 
matrix completion and under some mild conditions that link the rank 
and the "spikiness" of Aq, up to a constant, the prediction risk of our 
estimator is comparable to the sharpest bounds obtained until now. 
For more details see Section 4. 

In Section 5, we apply our idea to the problem of matrix regression 
which is yet another special case of trace regression. Previously, the 
problem of matrix regression with unknown noise variance was consid- 
ered in [2, 7]. These two papers study the rank-penalized estimators. 
Bunea et al [2], who first introduced the idea of such estimators, pro- 
pose un unbiased estimator of cr which requires an assumption on the 
dimensions of the problem. This assumption excludes an interesting 
case, the case when the sample size is smaller than the number of co- 
variates. The method proposed in [7] can be applied to this last case 
under a condition on the rank of the unknown matrix Aq. Our method, 
unlike the method of [2], can be applied to the case when the sample 
size is smaller than the number of covariates and our condition is weaker 
than the conditions obtained in [7]. For more details see Section 5. 

2. Preliminaries 

2.1. Model. Let Aq G M™'!^™'^ unknown matrix, and consider 

the observations {Xi^Yi) satisfying the trace regression model 

(2.1) = tr(XfAo) + ae„^ = l,...,n. 
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Here, a > is the unknown standard deviation. The noise variables 
are independent, identically distributed and having law $ such that 

(2.2) E$(e.) = 0, E$(e,') = 1, 

Xi are random matrices with dimension mi x 777,2 and tr(y4) denotes 
the trace of the matrix A. 

We consider the problem of estimating of Aq. Our main motivation 
is the high-dimensional setting, which corresponds to mim2 ^ n, with 
low rank matrices Aq. 

The trace regression model is a quite general model which contains 
as particular cases a number of interesting problems. Let us give two 
examples which we will consider with more details in this paper. 

• Matrix Completion Assume that the design matrices are 
i.i.d uniformly distributed on the set 

(2.3) X = I Cj (mi) 6^(7712), 1 < j < mi, 1 < k < 1112) , 

where e/(m) are the canonical basis vectors in M™. Then, the 
problem of estimating Aq coincides with the problem of matrix 
completion under uniform sampling at random (USR). 

• Matrix regression The matrix regression model is given by 

(2.4) Ui = ViA^ + Ei z = l,...,/, 

where Ui are 1 x m2 vectors of response variables, Vi are 1 x mi 
vectors of predictors, Aq is an unknown mi x m2 matrix of 
regression coefficients and Ei are random 1 x m2 vectors of 
noise with independent entries and mean zero. 

We can equivalently write this model as a trace regression 
model. Let Ui = {Uik)k=i,...,m2, Ei = {Eik)k=i,...,m2 and Zj^. = 
efc(?Ti2) Vi, where efc(m2) are the m2 x 1 vectors of the canonical 
basis of M'"^. Then, we can write (2.4) as 

Uik = tT{Zjl.Ao) + Eik i = l,...,l and k = 1, ... ,1712- 

2.2. Notation. For any matrices A,B E M"*i^"*2^ define the scalar 
product 

{A,B) =ii{A^B). 

For < g < 00 the Schatten-q (quasi-)norm of the matrix A is defined 
by 

(min(mi,m2) \ ^^'^ 

(yj{Ay\ for < g < cx) and Plloo = o-i(A), 

where {aj{A))j are the singular values of A ordered decreasingly. 
We summarize the notations which we use throughout this paper 

2 

(2.5) X = —^YiXi and M = /^-^ (X - Aq) ; 
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(2.6) A = M££andAoo = ||M 

iVl h 



OG is the subdifferential of G; 

is the orthogonal complement of S\ 
Ps is the projector on the linear vector subspace 5"; 
F{A) = \\A-Xh + X\\A\\,. 



\A\\ = max I aij \ where A = (ay] 



2.3. Estimator. In [13], the authors propose the following estimator 
for the trace regression model 

(2.7) Aml = argmin | || A ||i^(n) -(-^y^X,, a\ + X\\A\\ 



m-i X 7712 



n- 

i=l 



1 " 

Here, A > is a regularization parameter, || A ||^2(n)~ ((^) ^«)^) 

1 

and n = — Vllj where Ilj are the distributions of Xi. 

ni=i 

If the following assumption of restricted isometry in expectation is 
satisfied 

for some /i > M Ili2(n)= /""^ M IlL 
then (2.7) has a particularly simple form: 

(2.8) Aml = argmin { \\ A - X g +\^l'\\A\\,} 
where 

2 ^ 

(2.9) X = ^J^F.X,. 



n 



In the first part of the present paper we study the following estimator 
(2.10) = argmin {\\A - X||2 + X\\A\\,} . 



In order to simplify our notations we will write A = A^.^. Note that 
the first part of our estimator coincides with the square root of the 
data-depending term in (2.8). This is similar to the principle used to 
define the square-root lasso for the usual vector regression model, see 
[1]. Theorem 2 gives an oracle bound on the prediction error of A. 
This bound is obtained for an arbitrary /i and does not rely on the 
knowledge of the distributions of X^. We apply Theorem 2 to matrix 
completion, taking /i^ = mim2. 

In the second part of the present paper, dedicated to matrix regres- 
sion problem, we consider a new estimator inspired by the same idea. 
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namely 

(2.11) A= argmin {||f/-yA||2 + A||1/A||i} 



Note that in (2.11) we penahzed by the nuclear norm of VA, rather 
the by the nuclear norm of A as in (2.7). 

3. General oracle inequalities 

In this section, in Theorem 2, we provide a general oracle inequality 
for the prediction error of our estimator. The proof of Theorem 2 is 
based on the ideas of the proof of Theorem 1 in [13]. However, as the 
statistical structure of our estimator is different from that of the es- 
timator proposed in [13], the proof requires several modifications and 
additional information on the behavior of the estimator. This informa- 
tion is given in Lemmas 1 and 3. In particular. Lemma 1 provides a 
bound on the rank of our estimator in the general setting of the trace 
regression model. 

Lemma 1. 

rank(i) < 1/Al 

Proof. That A is the minimum of (2.10) implies that G dF{A). We 
will use the fact that the subdifferential of the convex function A — >■ 
||A||i is the following set of matrices (cf. [20]) 

( rank(A) ^ 

(3.1) spill = I u,{A)v';{A) + Ps.^A)WPs.^A) ■■ l|W^lloo<l| 

Uj{A) and Vj{A) are respectively the left and right orthonormal singular 
vectors of A, Si{A) is the linear span of {uj{A)}, 5*2(^4) is the linear 
span of {vj{A)}. For A 7^ X, this implies that there exists a matrix W 
such that llVTlloo < 1 and 



A — 'K. rank(i) 

II — II 2 

Calculating the ||||2 norm of both sides of (3.2) we get that 1 > 
A^rank(yl). When A = X, instead of the differential of ||yl — X||2 

A — X 

we use its subdifferential. In (3.2) the term — is replaced by a 

P-XII2 



Theorem 2. Suppose that — ^ — > A > 3 A for some p < 1, 



matrix W such that ||Vr||2 < 1 and we get again 1 > A^rank(A). □ 

P 

A/2rank(ylo) 

A-A,\\l< inf \{^i-p)-^\\A-A,\\l+(p^\ llMll^ankA^ 

^/2rankM)<p/A \^ ~ P J 



then 
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Proof. We need the following auxiliary result which is proven in the 
Appendix 

Lemma 3. Suppose that — ^ - > A > 3 A for some p < 1, then 



^rank(y4i 



oj 



(3.3) ||i-X|b >( '"^^ )||Ao-X||, 

\3+ v/1 + pV 

If A = X, then (3.3) implies that = X and we get — y4o||2 = 0. 
If A 7^ X, a necessary condition of extremum in (2.10) implies that 
there exists V G <9||i||i such that for any A e ]R'"iX'"2 

^ + A V, A-A)<0 

2P-X||2 

which yields 

(3.4) 2(i-Ao,i-A)-2(X-Ao,i-A)+2A||i-X||2(V,i-A) < 0. 

By (3.1) we have the following representation for an arbitrary V G 
d\\Ah 

(3.5) V = hu.vj + PstiA)WPs^^A), 

for simplicity we write Uj and Vj instead of Uj{A) and Vj{A). 

By the monotonicity of sub differentials of convex functions we have 
that {V -V,A - A) > 0. Then (3.4) and 2{A - Ao,A- A) = \\A - 
AoWl + \\A - A\\l - \\A - AoWl imply 



A - AoWl + \\A - A\\l + 2AP - X||2 (Pcx.^.l^P.x.^^, A - A 



(3.6) 

< \\A -Ao\\l + 2(X - Ao, i - A) - 2A||i - XII2 ( S u.vj, A- A 

\i=i 

From the trace duality we get that there exists W with ||W^||oo < 1 
such that 



(3.7) 



For a mi X m2 matrix B let Pyi(i?) = B — Pg±(^^^iBPg±(^^y Since 
and rank(P5'-(^)P) < rank(A) we have that rank(P^(P)) < 2rank(A). 
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Using the trace duality and triangle inequality we get 
{X-Ao,A-A) < ||X-Ao||ooP-^||i 



(3i 



Note that 
(3.9) 



< l|X-Ao||c 

+ ||X — AqIIoo 



Pa A- A 



P. 



A-A]P, 



j=i J J 



1. Then, the trace duality implies 



S M.wf , A- A 



T.Ujv',VA[A-A) ) < 



Pa A~A 



Putting (3.7), (3.8), (3.9) into (3.6) we compute 
(3.10) 

||i-Ao||^ + ||i-A||^ + 2A||i-X||2 



< \\A-Ao\\l + 2\\X-A 



Olloo 



Pa A- A 



+ 2||X-Ao|| 

+ 2A||i-X||2 
Using (3.3), from (3.10), we derive 

||i - AoWl + \\A-A\\l + 2x l~ ""^Il^ WA, - X||2 



Pa A- A 



3 + VI + 
< \\A- Ao\\l + 2\\X- Ao 

+ 2||X-Ao 

+ 2Ap-X||2 

From the definition of A we get 
(3.11) 



Ps^iA) A Ps^{A) 



Pa A- A 



Ps^(A) A ^S^-iA) 



Pa a- a 



||i-Aoi|^ + ||i-A||^ + 6 



l + p2 



11^0 — X||oo||-Psi(^) A Pg±(^j^^\ 



3 + VI + 

< \\A-Ao\\1 + 2\\X-Ao\\oo\\Pa [A- A) 111 

+ 2||X - Aollooll-Ps'ji-(A) A Ps^(A)h 

+ 2A||i-X||2||PA (A- A 
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Note that 6 ^ ^ > 2 for any p < 1. Thus, (3.11) yields 



(3.12) 

\\A - AoWl + \\A - A\\l < \\A - AoWl + 2||X - Ao||oo||Pa [a-A 

+ 2A||i-X||2||PA (i- A 
Now, using the triangle inequality and the fact that 



A-Aj ^ < ^y2Tan^A)\\A- Ay 

from (3.12) we get 
(3.13) 

\\A - AoWl + \\A - A\\l < \\A - AoWl + 2(||X - AolU 

+ A||X - A0W2) v/2rank(A)||i - AW2 
+ 2A||i - Ao||2V2rank(yl)||i - AW2. 

From the definition of A we get that ||X — v4o||oo ^ -^ll^ — 74o||2/3. For 
A such that A-^2rank(A) < p, (3.13) implies 

WA - AoWl + WA-AWl<WA- AoWl 



+ -A||X - Ao||2V2rank(A)||i - AW2 
o 

+ 2PWA-A0W2WA-AW2. 

Using 2ab < a? + 1)^ twice we finally compute 

(1 - p)P - ^oll^ + wa-awI<Wa-AoWI + pWa- AWl 

+ ^A||X - Ao||2V2rank(A)||i - AW2 

and 

(1 - p)WA - AoWl < WA - AoWl + Y^^'IIX - ^Il2rank(^) 
which implies the statement of Theorem 2. □ 

4. Matrix Completion 

In this section we apply the general oracle inequality of Theorem 
2 for the model of USR matrix completion. Assume that the design 
matrices Xi are i.i.d uniformly distributed on the set X defined in (2.3). 
This implies that 



1 " 

-jy{{A,x,f) = imm2r' 



i=l 
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for all matrices A G M™i^™2 and we take fi^ = mim2. 

We will consider the case of sub-Gaussian noise and matrices with 
uniformly bounded entries. We suppose that the noise variables are 
such that 

(4.1) E(eo = 0, ml) = 1 

and there exists a constant K such that 

(4.2) E [exp(t^i)] < exp {t^/2K) 

for all t > 0. Normal A^(0, 1) random variables are sub-Gaussian with 
K = 1 and (4.2) implies that has Gaussian type tails: 

P{|^*| >t} < 2exp{-tV2/i} . 

Note that condition KC,f = 1 implies that K < 1. 
Let a denote a constant such that 

(4.3) PolLup<«- 

In order to specify the value of the regularization parameter A, we 
need to estimate A (defined in (2.6)) with high probability. In what 
follows we will denote by c a numerical constant whose value can vary 
from one expression to the other and is independent from n, mi,m2. 
Set m = mi+m2, mi Am2 = min(mi, 7712) and mi\/m2 = max(mi, 7712). 
The following bound is a consequence of Lemmas 2 and 3 in [13], 

Lemma 4. Forn > 8(miAm2) log^m, with probability at least 1—3/m, 
one has 



(4.4) Aoo < (c*a + 2a), 



2 log(m) 
(mi A 7712)1^ 



where c* is a numerical constant which depends only on K . 
If 0,1"^ ^(0, 1), then we can take c^ = 6.5. 

Proof. The bound (4.4) is stated in Lemmas 2 and 3 in [13]. A closer 
inspection of the proof of Proposition 2 in [12] gives an estimation on 
c^: in the case of Gaussian noise. For more details see the appendix. □ 

The following Lemma, proven in the appendix, provides bounds on 
IIMII2. 

Lemma 5. Suppose that An < mim2. Then, for M defined in (2.5), 
there exists absolute constants (ci, C2) such that, with probability at least 
1 — 2/mim2 — ci exp{— C2n}, one has 



10 



OLGA KLOPP 



(ii) 



111 



n 
i=l 



> 



I A 



0II2 



> 



4IIA 



|2 
OII2 



nmim2 (mim2)^' 



|M1I,>5 



n 



Recall that the condition on A in Theorem 2 is that A > 3A. Using 
Lemma 4 and the lower bounds on ||M||2 given by Lemma 5 we can 
choose 



(4.5) 



A = 2c, 



logm 

m\ A m2 



4a 



2 n log m 



mi A 777,2 



EyiX^ 



i=l 



With this choice of A, the assumption of Theorem 2 that 

A takes the form 
P 



(4.6) 



y/rankp^ 



> 2c, 



logm 

nil A 1712 



4a 



^rank(Ao) 
2 n log m 1 



> 



nil A m2 



J:y^x^ 



i=l 



Using (ii) of Lemma 5 we get that (4.6) is satisfied with a hight prob- 
ability if 



logm 4 a^niini2 / 2 logm 



(4.7) ^ - > 

A/rank(Ao) *\miAm2 ' IIA0II2 \m1Am2 

Note that as mi and m2 are large, the first term in the rhs of (4.7) is 
small. Thus (4.7) is essentially equivalent to 



(4.8) 



where a 



/ 2 logm / 

P > 4^/7 Vrank Ao 

(mi A m2) 



a 



sp 



y/niini2 \\Aq 



I sup ■ 



sp 



I A 



is the spikiness ratio of Aq. The notion of 



0II2 



"spikiness" was introduced by Negahban and Wainwright in [15]. We 
have that 1 < Ogp < ^/n^nl^ and it is large for "spiky" matrices, i.e. 
matrices where some "large" coefficients emerge as spikes among very 
"small" coefficients. For instance, agp = 1 if all the entries of Aq are 
equal to some constant and asp = y^mTm^ if Aq has only one non-zero 
entry. 

Condition (4.8) is a kind of trade-off between "spikiness" and rank. If 
asp is bounded by a constant, then, up to a logarithmic factor, rank(Ao) 
can be of the order mi Am2, which is its maximal possible value. If our 
matrix is "spiky", then we need low rank. To give some intuition let 
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US consider the case of square matrices. Typically, matrices with both 
high spikiness ratio and high rank look almost diagonal. Thus, under 
uniform sampling and if n <^ mim2, with high probability we do not 
observe diagonal (i.e. non-zero) elements. 

Theorem 6. Let the set of conditions (4.1) - (4.3) be satisfied and 



mim2 



and 



X be as in (4.5). Assume that 8(mi A m2)log^m < n < 

that (4.7) holds for some p < 1. Then, there exists absolute constants 
(ci, C2) such that, with probability at least 1 — 4/m — ci exp{— C2n} 

1 ,, ^ . m , „ (mi V 7712) 



(4.9) 

where C* 



A- 
mim2 

16 (2c,a2 



^o|l2 — ^* 



(18 + 2c,)a^ 



n 



-rank(Ao) logm 



Proof. This is a consequence of Theorem 2 for A = Aq. From (4.5) we 
get 

(4.10) 

/ 

8(mim2)^ 



\A-A. 



0112 



< 



\ 



4 log m / 2 n log m 

h 2aW 

mi A m2 V mi A m2 



1 



v 



n 
i=l 



X ||M||2rank(Ao). 
Using triangle inequality and (ii) of Lemma 5 we compute 



(4.11) 



|M||2< 



1 " 

E 



n 



1=1 



mim2 



I A 



0II2 



< 



i=l 



Using (i) of Lemma 5 and (4.11), from (4.10) we get 



\A-A. 



0II2 



< 



16 log(m)(mim2) 



2 c* 



1^1 — p)2(mi A m2 
We then use IIA0II2 < a^mim2 to obtain 

l|A- Aolli _^161og(m)(mi Vm2) /_ 2 

< 7- (2c,a 

mim2 (1 — p) n 

This completes the proof of Theorem 6. 




rank(y4o 



(18 + 2c,)a2) rank(Ao 



Theorem 6 guarantees that the normalized Frobenius error 



A 



□ 

Ao II 



of the estimator A is small whenever n > C{mi V m2) log(m)rank(Ao) 
with a constant C large enough. This quantifies the sample size n nec- 
essary for successful matrix completion from noisy data with unknown 
variance of the noise. This sampling size is the same as in the case of 
known variance of the noise. 
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In order to compare our bounds to those obtained in past works on 
noisy matrix completion, we will start with the paper of Keshavan et 
al [9]. Under a sampling scheme different from ours (sampling without 
replacement) and sub-Gaussian errors, the estimator proposed in [9] 
satisfies, with high probability, the following bound 

(4.12) -J—\\A-A4l < fc^V^ ^""^ ^"''^ ank(Ao)logn. 
mim2 n 

The symbol < means that the inequality holds up to multiplicative 
numerical constants, k = crmax(^o)/<^min(^o) is the condition number 
and a = {mi V 7712) /{rrii A 7712) is the aspect ratio. Comparing (4.12) 
and (4.9), we see that our bound is better: it does not involve the 
multiplicative coefficient k'^^/a which can be big. 

Wainwright et al in [15] propose an estimator which, in the case of 
USR matrix completion and sub-exponential noise, satisfies 

1 in 

(4.13) \\A - AoWl < a^p— rank(Ao) logm. 

mim2 n 

Here asp is the spikiness ratio of Aq. For asp bounded by a constant, 

(4.13) gives the same bound as Theorem 6. The construction of A in 

[15] requires a prior information on the spikiness ratio of Aq and on ex. 

This is not the case for our estimator, which is completely data-driven. 

The estimator proposed by Koltchinskii et al in [13] achieves the 

same bound as ours. In addition to prior information on H^ollsup' ^h^ir 

method also requires prior information on a. In the case of Gaussian 

errors, this rate of convergence is optimal (cf. Theorem 6 of [13]) for 

the class of matrices A{r, a) defined as follows: for given r and a, for 

any Aq G A{r^ a) the rank of Aq is supposed not to be larger than r 

and all the entries of Aq are supposed to be bounded in absolute value 

by a. 

5. Matrix Regression 

In this section we apply our method to matrix regression. Recall 
that the matrix regression model is given by 

(5.1) U, = V,Ao + E, 1 = 1,..., I, 

where Ui are 1 xm2 vectors of response variables; Vi are 1 xmi vectors of 
predictors; Aq is an unknown mi x m2 matrix of regression coefficients; 
Ei are random 1 x 7722 noise vectors with independent entries Eij. We 
suppose that Eij has mean zero and unknown standard deviation a. 

Set V = (yf , ...,V;'f,U= (f/f , ...,Un'^ and E = {Ef, Eff. 
We define the following estimator of Aq. 

A = argmin {\\U - V A\\2 + \\\V A\\i] , 
where A > is a regularization parameter. 
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Let Vv denote the orthogonal projector on the hnear span of the 
columns of matrix V and let Vy = 1 — Vy- Note that 

V^V^ = 

and A^V = (which means that the columns of A are orthogonal to 
the columns of V) implies VyA = 0. 

The following lemma is the counterpart of Lemma 1 in the present 
setting. 

Lemma 7. 

rank(\/i) < l/A^. 

Proof. That A is the minimum of (2.11) implies that G dG{A) where 

G= \\U -V A\\2 + X\\VA\\i. 

Note that the subdifferential of the convex function A — )■ is the 

following set of matrices 

d\\VA\\^ = V^^ u,iVA)vJ{VA) + Ps.^y^)WPs.^vA) ■■ \moo<lj 

where Si{VA) is the linear span of {uj{VA)} and S2{VA) is the linear 
span of {vj{VA)}. 

If A is such that VA ^ U, we obtain that there exists a matrix W 
such that ||Vr||oo < 1 and 

VA — U (va.nk{VA) ^ 

^^ llvA-UWr '^^X .Si ''^(^M(^^) + PsHvA)WPsHvA)j 

which implies 
(5.2) 

VA — U ('rank(yA) 

^^'^^ \\VA-Uh ^ -^^^^^ I ^AV^HiV^) + PsHvA)WPs^^vA)j 

Recall that V^VyB = implies that V^B = VyB = and we get 

from (5.2) 

(5.3) 

VA — U (iank{VA) r -, ^ 

'^y ^^vA-Uh ^~^\ 'PvMVA)vJ{VA) + Vv [Ps^^vA)WPs^^vA)\ | • 

By the definition of the singular vectors we have that VA {vj{VA)) = 
aj(VA)uj(yA) and we compute 

VvVA (vjiVA)) = aj{VA)VvUj{VA) 

using VvVA{vj{VA)) = VA{vj{VA)) = aj{VA)uj{VA) and aj ^ 
we get 

(5.4) Vvu^iVA) = Uj{VA) 
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and we obtain from (5.3) 
(5.5) 

VA — U (rank{VA) r -, 

'^y ^^vA-Uh ^'^i MVMiVA) + Vv [PsHvA)WPs,^^vA)\ | 

Note that for any w such that {w,Uj{VA)) = (5.4) imphes that 
(5.6) {Vvw, Uj{VA)) = (w, u,{VA)) = 0. 

By the definition, P^x^^^) projects on the subspace orthogonal to the 
hnear span of {ujiVA)}. Thus, (5.6) imphes that VyPs^iVA) ^Iso 
projects on the subspace orthogonal to the linear span of {uj{VA)}. 

Calculating the HHg norm of both sides of (5.5) we get that 1 > 
X'^rank{vA). When vA = U, instead of the differential of \\U -V A\\2 
we use its subdiffential. □ 

Minor modifications in the proof of Theorem 2 yield the following 
result. We set A' = "^o 



Theorem 8. Suppose that — ^ > A > 3 A' for some p < I, 

v/2rank(yAo) " " ' 

then 



V[A-A. ■ ^ 



2 ^2rank{VA)<p/A [ \^ ~ P J 

Proof. The proof follows the lines of the proof of Theorem 2 and it is 
given in the appendix. □ 

To get the oracle inequality in a closed form it remains to specify the 
value of regularization parameter A such that A > 3A'. This requires 
some assumptions on the distribution of the noise {Eij)ij. We will 
consider the case of Gaussian errors. Suppose that Eij = a^ij where 
^ij are normal A^(0, 1) random variables. In order to estimate H^y-EHgQ 
we will use the following result proven in [2]. 

Lemma 9 ([2], Lemma 3). Let r = rank(V) and assume that Eij are 
independent A^(0, cr^) random variables. Then 



E(||Pyi?||J<a(v/^+v^) 

and 

^{\\VvE\\^ > E{\\VvE\\J + at} < exp {-tV2} • 

We use Bernstein's inequality to get a bound on ||-E||2- Let a < 1. 
With probability at least 1 — 2 exp {— ca^/m2}, one has 



(5.7) {l + a)a^/h^>\\E\\^>{l-a)a^/^ 



m2. 
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Let (3 > and take t = (3 (^y/rlvi + v^) in Lemma 9. Then, using (5.7) 
we can take 

(5.8) A = (l+/)(v^v^). 

(1 — a)\/lm2 

Put 7 = — > 1. Thus, condition — ^ > A gives 

1-a v^2rank(VAo) ~ 

(5.9) rank(yAo) < ^'^""^ 



(a/^+ aA) 

and we get the following result. 

Theorem 10. Assume that are independent N{0, 1). Pick A as in 
(5.8). Assume (5.9) be satisfied for some p < 1, a < 1 and f3 > 0. 
Then, with probability at least 1 — 2exp {— c(m2 + r)} we have that 



V A-Ar 



2 



2 



< a^(m2 + r) rank(yylo). 



The symbol < means that inequality holds up to a multiplicative nu- 
merical constant and c denotes a numerical constant that depends on 
a and /3. 

Proof. This is a consequence of Theorem 8. □ 

Let us now compare condition (5.9) with the conditions obtained in 
[2, 7]. The method proposed in [2] requires m2(/ — r) to be large, which 
holds whenever Z ^ r or Z — r > 1 and 777,2 is large. This condition 
excludes an interesting case / = r <^ m2. On the other hand (5.9) is 
satisfied for / = r -C 777,2 if 

rank(Ao) < / 

where we used rank(V^/lo) < r A rank(Ao). 

The method of [7] requires the following condition to be satisfied 

(5.10) rank(Ao) < ^ ^ 



C2 (a/^+ Vr)' 



with some constants Ci < 1 and C2 > 1. As rank(V^y4o) < rank(Ao), 
condition (5.9) is weaker then (5.10). Note also, that, to the opposite 
of [7], our results are valid for all Aq provided that 

P^/77l2 

r < 



27^ (a/^+ y/r) 



2 • 



For large 7772 ^ /, this condition roughly mean that / > cr for some 
constant c. 
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Appendix A. Proof of Lemma 3 



If Aq = X, then we have trivially ||A — X||2 > 0. 
If y4o 7^ X, by the convexity of the function A— — X||2, we have 

(A.l) 

(Ao-X,i-Ao) 



A-X 



|^o-X||2 > 



> 



> 



Using Lemma 1, the bound 

from (A.l) we get 
(A.2) 

\\A - XIU - po - X||2 > 



P0-XII2 

IIA0-XII2 " 

|^-X||, 



ll^-x| 
p 

^rank(Ao) 



^-^lli 



rank(y4) + rank(Ao)||v4 - Aolh- 



> A and the triangle inequality, 



l + p^||Ao-X|U 
A ||^-X||2 



lA-Xl 



|An-X| 



1^0 ~ Xll 

Note that tt—; < 1 /3 which finally leads to 



A||Ao-X||2 
2' 



lA-XlU > 1 



lAn-Xl 



This completes the proof of Lemma 3. 

Appendix B. Proof of Lemma 4 

Our goal is to get a numerical estimation on in the case of Gaussian 
noise. Let Zj = (Aj — EAj) and 



Gz = max 



n 1/2 ^ 

1=1 no «=i 



1/2- 



nil A m2 



The constant c* comes up in the proof of Lemma 2 in [13] in the 
estimation of 



Ai 



^ n 1 " 

-y2^^X, < -Ve.(A,-EA, 

i=l nr, 1=1 



n 
i=l 



A standard application of Markov's inequality gives that, with proba- 
bility at least 1 — 1 /m 



(B.l) 



n 
i=l 



< 2 



logm 

nmim2 
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In [13], the authors estimate 



using [12, Propo- 
oUow the hues of the 



1 " 

sition 2]. To get a numerical estimation on c^, we 
proof of [12, Proposition 2]. In order to simplify notations, we write 
II lloo = II II and we consider the case of Hermitian matrices of size m'. 
Its extension to rectangular matrices is straightforward via self-adjoint 
dilation, cf., for example, 2.6 in [18]. 

n 

Let Yn = "^Zi. In the proof of [12, Proposition 2], after following the 

i=l 

standard derivation of the classical Bernstein inequality and using the 
Golden- Thompson inequality, the author derives the following bound 



(B.2) 
and 

(B.3) 



P(||K|| > t) < 2mV 



-At I 



\Zl II 71 



lEe 



XZi I 



< 1 + A^ 



1 - A||Zi| 



11^1 1 



Using that ||Zi|| < 2|,^j|, from (B.3), we compute 
(B.4) 



Ee 



< 1 + A^ 



E [(X, - EX^Y] E i 



e2A|5.| _ 1 _ 2A|ei 



< 1 + X^alE 



V 2! ■ 3! 
Assume that A < 1, then (B.4) implies 



+ ■ ■ 



||Ee^^i|| < 1 + AVfEe^l^'l < 1 + 2AV|e2 < exp{2AV|e2}. 

Using this bound, from (B.2) we get 

^{\\Yn\\ >t) < 2m'exp{-At + 2AV|e^}. 

It remains now to minimize the last bound with respect to A G (0, 1) 
to obtain that 



P(||>^n|| >t)< 2m'exp 



where we supposed that n is large enough. 

t2 



Putting 2m' exp 



l/(2m'), wegett = 2e 



2 log(2m')n 

nil A m2 



Using (B.l) we compute the following bound on c* 

c, < 2e + 1 < 6.5. 
This completes the proof of Lemma 4. 
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Appendix C. Proof of Lemma 5 
Let ej = aC,i. To prove (i) we compute 
(C.l) 

(M, M) = + f 1 - If: (^0, x.f + if:^ 



1=1 1=1 
I II 

n 



-] ^Yl (^0' + ^ S e, {Ao, X,) (X„ X,) 
mim2 ) n^t<j 

J=l V / 

^ y ^ 

IV 



III 

+ ^ E e,e, (X„ X,) + ^ S (Aq, X,) (Aq, X,) (X,, X,). 



V VI 



We estimate each term in (C.l) separately with a good probabihty. 
The estimations we give on this probability involve an absolute constant 
c> 0. 

/in \ Mnll^ 

I : We have that E — (^o,^i)^ = ^ and |(Ao,Xi)| < 



a. 

Using Hoeffding's inequahty , we get that, with probability at 
least 

1 - 2exp{-2(TV(8a)^} 



nrnvrri'}, on n^'- — ' nmimo on 

/ 1 " 

II: are sub-exponential random variables and E I -^^ef 

— . Using Bernstein inequality for sub-exponentials random 
n 

variables (cf. [19, Proposition 16] ) we get that, with probability 
at least 

1 -2exp|-cnmin a'^K /S"^ | 

1 " 



2 2 1 " 2 2 

— + ^> > — ^• 

n Sn n^^-^ n on 

i=l 



/ 2 " \ 
III: We have that E I — ^ {Aq, Xj) j =0, using Hoeffding's type 

y?7. {—I J 

inequality for sub-Gaussian random variables (cf. [19, Proposi- 
tion 10]) we get that, with probability at least 1— eexp {— ca^/in/a^} 

2 o " 2 

^>A5:(Ao,X.)e.>-f. 
8n n^ ^-^ on 

i=l 
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IV: We compute E (^^.^ (^o,^i) {Xi,Xj)^ = 0. We use the 
following lemma which is proven in the Appendix. 
Lemma 11. Suppose that n < mim2. With probability at least 

mim2 

S {Xi,X,)<n. 

Kj 

Lemma 11 and Hoeffding's type inequality imply that, with 
probability at least 1 — 2/mim2 — eexp {—ca'^nK/a'^} 

|- > - E e, {Ao,Xj) {X,,X,) > 

8n n^K] 8n 

V: We have that E ( — E e^e,- (Xi, X,) ) = 0. Using Bernstein type 

inequalities for sub-exponential random variables and Lemma 

11 we get that, with probability at least 1—2 exp |— cnmin ay/K/8 

2 , , 

— > —Ee,e,{X„Xj) > - — . 
8n n^Kj 8n 

VI: We compute that 
E^i^S (Ao,X,)(Ao,X,)(X„X,) 

1 ^ 1 ^ lUoll^ 



-S.(E((Ao,X,)X,),E((Ao,X,)XO) = -^.S 

< ll^oll^ 



m\'m2) 



Using Lemma 11 and Hoeffding's inequality, we get that, with 
probability at least 1 — 2/mim2 — 2 exp {— 2cr^r2/(8a)2} 



-1 S. (Ao, X,) (Ao, X,) (X„ X,) < ^ ^ 



n^iy^i \m\m2) 8n 

To obtain the lower bound, note that, for i 7^ j, (Xj,Xj) 7^ iff 
Xi = Xj. This implies that E (Ao,Xi) (Ao,Xj) (Xj-,Xi) > 0. We use 

that 2n < mim2 to get 

IIA0II2 , ( , 2n \ 1^ 



(mim2)^ 



\ / T — 1 



1=1 

Putting the lower bounds in II — V together we compute from (C.l^ 

n ,,2 

M ^ > — • 
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To obtain the upper bound, we use the upper bounds in I — VI. 
From (C.l) we get 



IMII^ < IM^^ 



1^0 



12 , <; 2 



f Polio a' 



+ — 



mim2j nmim2 on \nm1m2 n 



where we used that 2n < mim2. This completes the proof of part (i) 
in Lemma 5. 

To prove (ii) we use that (Xj, Xj) = 1 and (Xj, Xj) 7^ iff Xi = Xj. 
We compute 



1=1 i=l 



/n n \ l" 2 

1 



+ (Ao,X,)'(X„X,) 



+ ^ S e, (Ao, X,) (X„ X,) + ^ S e,e, (X„ X,) . 



This implies that 
(C.2) 



e,; 



i=l j=l 



'i=l 



I 



i=l 

II 



III 



+ 4^6, (Ao, X,) (X„ X,) + ^ S e,e, (X„ X,). 



IV 



Using the lower bounds for I — V we get from (C.2) 

|2 



/ n n \ 

\ j=l i=l / 



0112 



nmim2 



which proves the part (ii) of Lemma 5. 

(iii) is a consequence of (ii). For An < mim2 (ii) implies 



^ / n n 

— / ^riXi,^FiX, 



> 



l^ol 



i=l i=l I l"^l"^2j 

Now we complete the proof of part (iii) of Lemma 5 using that 

mim2 



|M||2> 



n 



i=l 
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Appendix D. Proof of Lemma 11 

Recall that for i ^ j, Xi and Xj are independent. We compute the 
expectation 



E ( S {Xi,Xj)) = S {EXi,EXj) 

\i<j J i<j 



n{n — 1) 
2mim2 



and the variance 
E 



- S E((X„X,))E((X,,,X,v)). 

When 2, j, i', j' are all distinct, E ((Xj, Xj) {Xir,Xj/)) is canceled by the 
corresponding term in S E ((Xj, X^)) E ((Xj/, Xj/)). Then, it remains 

i<j 

i'<j' 

to consider the following five cases: (1) i = i' and j = j'; (2) i = i' and 
J 7^ f; (3) « 7^ and j = f; (4) i = f and j 7^ i'; (5) i' = j and j' 7^ i. 

case (1) Note that {Xi,Xj) takes only two values or 1, which implies 
that 

E((X„X,)') =E((X„X,)) 



mim2 



cases (2)-(5) In these four cases, the calculation reduces to calculate E ((Xj, X^) (X^, X^)) 
for i ^ j and k ^ {hj}- Note that Vx^. = {■,Xk)Xk is the 
orthogonal projector on the vector space spanned by X^. We 
compute 

EVx, = -^—Id 
mim2 

where Id is the identity application on M™'^^™^^ Then, we get 
E (((X„ X,) Xk, X,)) = E {{Vx, (X,) , X,)) 

= (E(PxJ(EX,),EX,) 

= (EXj,EX,-) = ^ — ^. 

mim2 (mim2) 

These terms are canceled by the corresponding terms in 
S.E((X„X,))E((X,/,X,/))asE((X„Xfe))E((Xfe,X,)) = ^^^^^y 

i'<j' 

Finally we get that 

n{n — 1) 



E|( E(A-.A-.)I - (El S (A-.. A,) 11 < 
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The Bienayme-Tchebychev inequality implies that 

n{n — 1) 



P( S^.(X„X,) >n) < 



< 



2mim2 ( n — 



nin — V, 



77111712 
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when mim2 > 7i. This completes the proof of Lemma 11. 



Appendix E. Proof of Theorem 8 

We need the following auxiliary result, which corresponds to Lemma 
3, and which is proven in the appendix. 



Lemma 12. Suppose that 
then 



P 







VA-U 


A 



^yIank{VAQ j 

3- vTT7 
3 + JiT7 



> A > 3 A' for some p < 1, 



\E\\o. 



We resume the proof of Theorem 8. If VA = U, then Lemma 12 



implies that VAq = U and we get 



V A-A^ 



0. 



If VA ^ U, a necessary condition of extremum in (2.11) implies that 
there exists aW e d\\VA\\i such that for any A e R'^i^'^^ 



2(VA-U,V A- A 



VA-U 

that is 
(E.l) 

2(v (A- Ao) ,V (A- A 



x(w,v (^A- A^'^ <0 



2(U- VAq, ViA-A 



2X 



VA-U 



W,V [A- A)) <0 



rank(y^) 



Foraiy^G| u,{VA)vJ{VA) + Ps.^yA^WPs.^vA)--\\W\\oo<l 

by the monotonicity of subdifferentials of convex functions we have that 



W -Wa.V [A- A]) > 0. 

Then, (E.l) and 

2(v (A-Ao) ,V (A-a)) = V(A-Ao 



VIA- A 



V{A-A,) 
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imply 
(E.2) 
v(A-Ao 



VA-A 



2A 



VA-U 



VA-A 



< \\V{A-Ao)\\: + 2{E,v(A-A 



- 2A 



VA-U 



rank(yA) 

Uj{VA)vj{VAf, V[A-A 



By trace duality we can pick W with ||H^||oo < 1 such that 
(E.3) 



Ps.^yA)WPs.^yA).v(^A-A 



W, Ps^ (VA) V[A-A Ps. (VA) 



PsHVA)V(A-A]Ps±^VA) 



Let Pva{B) = B — Pg±(^Yj^-^BPg±i^Yj^y Then, using the trace duality 
and the triangle inequality we get 

(E.4) 



E,VA-A 



VvE.V A- A]) < WVvE 



VA-A 



+ \\VvE\ 



VA 



VA-A 



Ps^iVA)V {A- A) Ps^{VA) 



Note that 
implies 



rank(VA) 

E u,{VA)vJiVA) 



(E.5) 

rank(yA) 

S Ui{VA)vJ{VA),V [A- A 
j=l 



1. Hence the trace duality 



rank(\/A) 

S u,{VA)v,{VAf,PvA 



VA-A 



< 



VA 



VA-A 



Plugging (E.3), (E.4) and (E.5) into (E.2) we obtain 
(E.6) 



V A-Ar 



V[A-A 

2 



2A 



VA-U 



< \\V{A-A,)\\l + 2\\VvE,^ 



Psi{VA)V[A-A\Ps^^yj,^ 



+ 2\\VvE\ 



VA 



VA-A 



2A 



VA-U 



VA 



VA-A 
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Using Lemma 12, from (E.6) we compute 



V A-Af 



VA-A 



2A 



1 + 



\\Eh 



(VA) (VA) 



< \\V{A-Ao)\\t, + 2\\VvEl 



Ps^{VA)V {A- A) Ps^(yA) 



VA 



VIA- A 



+ 2A 



VA-U 



VA 



VA-A 



From the definition of A we get 



(E.7) 



V A-Ar 



6 



V i^A-A 
3- 



WVvEl 



(VA) (VA) 



< \\ViA-Ao)r + 2\\VvEl 



Es^{VA)V [A - Aj Ps^{vA) 



+ n'PvEl 



VA 



VIA- A 



+ 2A 



VA-U 



VA 



VA-A 



Note that 6 ^ ^ > 2 for any p < 1. Thus, (E.7) imphes 

3 + VI + P^ 



(E.8) 



V A-Ar 



VA-A 



< \\V{A-Ao 



+ 2\\VvEl 



VA 



VA-A 



2A 



VA-U 



VA 



VA-A 



Using 



VA-U 



< 



VA - VAo 



VAn - U 



and the fact that 



VA 



VA-A 



< v/2rank(VA) 



VA-A 
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from (E.8) we compute 
(E.9) 
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V A-Af 



VA-A 



< \\V{A-Ao)\\l 
+ 2Av/2rank(\/A) 



VlA-Ao 



VA-A 



+ 2X\\E\\2^/2Iank{V A) 



VA-A 



+ 2\\VvE\\^y/2mnk{VA) 



VA-A 



From the definition of A we get that \\VvE\\^ < ApHs/S and Aa/2 rank(VA) < 
p. This imphes that 



V A-Ar 



VA-A 



< \\V{A-Ao) 



+ 8/3A||^||2V2rank(VA) 



2p 



V A-Ao 



V\^A-A 

v(A-a 



Using 2ab < + twice we finally compute 



;i-p) 



V A-Ao 



+ 



VA-A 



< \\viA-Ao)\\' + p 



VA-A 



8/3X\\E\\2^/2Tank{VA) 



VA-A 



and 



;i-p) 



V A-Ar 



< \\ViA-Ao)\\' + 



4A^ 



\E\\liank{VA) 



which implies the statement of Theorem 8. 



Appendix F. Proof of Lemma 12 

If VAo = U, then we have trivially \\VA - Uy > 0. If VAo 7^ U, 
by the convexity of function A — )■ \\VA — U\\2, we have 



(F.l) 

VA-U 



\VAo - f/|L > 



VviE) ,V (A- A. 



> 



(vAo-U,v(^A-Ao 
\\VAo-U\\, 

. Il^v-(i^)||. 



\VAo-U\\, 

ry(^)lloo 
II^IU 



1^1 



rank(VAo) + Tank{VA) 



V{A-Ao 
v(A-Ao 
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Using the bound 



P 



> A, Lemma 7 and the triangle inequal- 



^Tank{VA) 



ity from (F.l) we get 
VA-U -\\VAo-U\\^> 




the definition of A we have — — — < 1 /3 which finally leads to 

A -C/ L 





This completes the proof of Lemma 12. 

Acknowledgements. It is a pleasure to thank A. Tsybakov for 
introducing me this problem and illuminating discussions. 



[1] Belloni, A., Chernozhukov, V. and Wang, L. (2011) Square-root 

Lasso: Pivotal Recovery of Sparse Signals via Conic Programming. 

Biometrika, to appear. 
[2] Bunea, F., She, Y. and Wegkamp,M. (2011) Optimal selection of 

reduced rank estimators of high-dimensional matrices. Annals of 

Statistics, 39, 1282-1309. 
[3] Candes, E. J. and Plan, Y. (2009). Matrix completion with noise. 

Proceedings of IEEE. 
[4] Candes, E.J. and Recht, B. (2009) Exact matrix completion via 

convex optimization. Fondations of Computational Mathematics, 

9(6), 717-772. 

[5] Candes, E.J. and Tao, T. (2009) The power of convex relaxation: 
Near-optimal matrix completion. IEEE Trans. Inform. Theory, 
56(5), 2053-2080. 

[6] Gai'ffas, S. and Lecue, G. (2010) Sharp oracle inequalities for the 
prediction of a high-dimensional matrix. IEEE Transactions on 
Information Theory, to appear. 

[7] Giraud,C. (2011) Low rank Multivariate regression. Electronic 
Journal of Statistics, 5, 775-799. 

[8] Gross, D. (2011) Recovering low-rank matrices from few coeffi- 
cients in any basis. IEEE Transactions, 57(3), 1548-1566. 

[9] Keshavan, R.H., Montanari, A. and Oh, S. (2010) Matrix com- 
pletion from noisy entries. Journal of Machine Learning Research, 
11, 2057-2078. 

[10] Keshavan, R.H., Montanari, A. and Oh, S. (2010) Matrix com- 
pletion from a few entries. IEEE Trans, on Info. Th., 56(6), 



References 



2980-2998. 



MATRIX ESTIMATION WITH UNKNOWN VARIANCE 



27 



[11] Klopp, O. (2011) Rank penalized estimators for high-dimensional 
matrices. Electronic Journal of Statistics, 5, 1161-1183. 

[12] Koltchinskii, V. (2011) Von Neumann Entropy Penalization and 
Low Rank Matrix Estimation. Annals of Statistics, to appear. 

[13] Koltchinskii, V., Lounici, K. and Tsybakov, A. (2011) Nuclear 
norm penalization and optimal rates for noisy low rank matrix 
completion. Annals of Statistics, to appear. 

[14] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) 
low-rank matrices with noise and high-dimensional scaling. Annals 
of Statistics, 39, 1069-1097. 

[15] Negahban, S. and Wainwright, M. J. (2010). Restricted strong 
convexity and weighted matrix completion: Optimal bounds with 
noise. Arxiv: 1009.2118. 

[16] Recht, B. (2009) A simpler approach to matrix completion. Jour- 
nal of Machine Learning Research, to appear. 

[17] Rohde, A. and Tsybakov, A. (2011) Estimation of High- 
Dimensional Low-Rank Matrices. Annals of Statistics, 39, 887- 
930. 

[18] Tropp, J. A. (2011) User-friendly tail bounds for sums of random 
matrices. Found. Comput. Math., to appear. 

[19] Vershynin, R. (2012) Introduction to the non- asymptotic analysis 
of random matrices. Chapter 5 of the book Compressed Sensing, 
Theory and Applications, ed. Y. Eldar and G. Kutyniok. Cam- 
bridge University Press. 

[20] Watson, G. A. (1992) Characterization of the sub differential of 
some matrix norms. Linear Algebra AppL, 170, 33-45. 

Laboratoire de Statistique, crest and University Paris Dauphine, 
CREST 3, Av. Pierre Larousse 92240 Malakoff France 
E-mail address: olga.klopp@ensae.fr 



